A generalized disambiguation algorithm for weighted finite automata and its application to NLP tasks
نویسندگان
چکیده
We present a disambiguation algorithm for weighted finite tree automata (FTA). This algorithm converts ambiguous FTA into equivalent non-ambiguous one where no two accepting paths labeled with the same tree exists. The notion of non-ambiguity is similar to that of determinism in the automata theory, but we show that disambiguation is applicable to the wider class of weighted automata than determinization. We conduct experiments on Natural Language Processing (NLP) tasks, and also show that disambiguated automata become much smaller than determinized automata in practice.
منابع مشابه
The correctness of a generalized disambiguation algorithm for finite automata
We present a generalized disambiguation algorithm of finite state automata, and show a proof of its correctness. This algorithm can remove ambiguities of finite state and tree automata. Our proposed algorithm can make finite state and tree automata more efficient to use in many applications.
متن کاملOn the Disambiguation of Weighted Automata
We present a disambiguation algorithm for weighted automata. The algorithm admits two main stages: a pre-disambiguation stage followed by a transition removal stage. We give a detailed description of the algorithm and the proof of its correctness. The algorithm is not applicable to all weighted automata but we prove sufficient conditions for its applicability in the case of the tropical semirin...
متن کاملA disambiguation algorithm for weighted automata
We present a disambiguation algorithm for weighted automata. The algorithm admits two main stages: a pre-disambiguation stage followed by a transition removal stage. We give a detailed description of the algorithm and the proof of its correctness. The algorithm is not applicable to all weighted automata but we prove sufficient conditions for its applicability in the case of the tropical semirin...
متن کاملA Better -Best List: Practical Determinization of Weighted Finite Tree Automata
Ranked lists of output trees from syntactic statistical NLP applications frequently contain multiple repeated entries. This redundancy leads to misrepresentation of tree weight and reduced information for debugging and tuning purposes. It is chiefly due to nondeterminism in the weighted automata that produce the results. We introduce an algorithm that determinizes such automata while preserving...
متن کاملA Better N-Best List: Practical Determinization of Weighted Finite Tree Automata
Ranked lists of output trees from syntactic statistical NLP applications frequently contain multiple repeated entries. This redundancy leads to misrepresentation of tree weight and reduced information for debugging and tuning purposes. It is chiefly due to nondeterminism in the weighted automata that produce the results. We introduce an algorithm that determinizes such automata while preserving...
متن کامل